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Abstract. Recently alternative approaches in cosmology seeks to explain the nature of dark matter 
as a direct result of the non-linear spacetime curvature due to different types of deformation 
potentials. In this context, a key test for this hypothesis is to examine the effects of deformation on 
the evolution of large scales structures. An important requirement for the fine analysis of this pure 
gravitational signature (without dark matter elements) is to characterize the position of a galaxy 
during its trajectory to the gravitational collapse of super clusters at low redshifts. In this context, 
each element in an gravitational N-body simulation behaves as a tracer of coUapse governed by the 
process known as chaotic advection (or lagrangian turbulence). In order to develop a detailed study 
of this new approach we develop the COsmic LAgrangian TUrbulence Simulator (COLATUS) to 
perform gravitational N-body simulations based on Compute Unified Device Architecture (CUDA) 
for graphics processing units (GPUs). In this paper we report the first robust results obtained from 
COLATUS. 
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I. INTRODUCTION 

The gravitational N-body simulations have become a powerful tool for testing the theo- 
ries of structure formation in astrophysical and cosmological systems [1]. In particular, 
it has been shown that the statistical characterization of dark matter distribution is an im- 
portant ingredient in the investigation of large-scale structure formation in the Hubble 
volume simulated from the GADGET- VC algorithm [2]. 

Recently, an established statistical method was used to demonstrate the importance of 
considering chaotic advection (or Lagrange Turbulence) [3] in combination with gravi- 
tational instabilities in the A - CDM simulations performed from the Virgo Consortium 
(VC) [4]. However, the GADGET- VC algorithm does not allow in a straightforward 
approach the computation of the kinematics of a single particle, requirement which 
is necessary for the investigation of the chaotic advection in alternative cosmological 
models considering only barionic matter [6]. This limitation appears because the in- 
teraction forces are computed by the TreePM scheme [5]. The COsmic LAgrangian 



Turbulence Simulator (COLATUS) is a new algorithm to perform gravitational N-body 
simulations allowing the computation of the velocity of a single particle at every time- 
step and then the evaluation of its energy power spectrum. To achieve its objective CO- 
LATUS compute the gravitational forces by using a direct summation scheme(the sim- 
plest PP scheme). COLATUS is implemented in a Compute Unified Device Architecture 
(CUDA) by using the Nvidia graphics processing units (GPUs) to reduce the simulation 
runtime. In the present work we show the preliminary simulations including up to 32^ 
particles using 1536 cores of NVIDIA GTX680. 



II. GPU/CUDA COSMOLOGICAL N-BODY SIMULATION 

For N-bodies under the influence of physical gravitational forces in a comoving 
coordinates(x|^ — Tl/a) the motion equations take the form: 

where e is the softening factor(collisionless assumption), a is the scale factor which is 
taken to be 1 at the present time and it changes ruled by the Friedmann equation: 
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where CIr is the radiation density, Qm is the matter (dark plus baryonic) density, CIk is 

spatial curvature density, Q.^ is the cosmological constant or vacuum density today and 
Hq is the Hubble parameter for z = 0. These equations are valid for non-relativistic matter 
(v << c,4> << c^) at scales that are much smaller than the Hubble radius (L << c/Hq) 
[17]. The equation of motions can be integrated using the difference equations [24]. 

^n+l/2 ^ (3) 
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However, these equations are not symplectic, but by making a suitable canonical trans- 
formation, one can derive an integrator that is symplectic [20, 22]. 
The Lagrangian for the particle motion in the comoving frame is: 
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Switching to the Hamiltonian formalism, the momentum canonical to 1^ is = ma 
and the Hamiltonian is: 

H = -^+m^^-^ (7) 
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then 
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Although this Hamiltonian is time-dependent, it is separable, so the "drift" and "kick" 
operators are easily derived as: 



where T is the step size. 

Note that, the gravitational force in the Newtonian limit falls as 1/r^, hence it is a long 
range force and we cannot ignore force due to distant particles. This makes calculation 
of forces the most time consuming task in N-Body simulations. As a result, a lot of 
attention has been focused on this aspect and many algorithms and optimizing schemes 
have been developed, we refer the reader to [1, 8] for a detailed review. These schemes 
have been evolved to address this problem and replace direct summation by methods that 
are less time consuming. However the direct summation implemented in GPU/CUDA 
approach provide at least three features that help it achieve high efficiency [18]: (i) 
Straightforward parallelism with sequential memory access patterns; (ii) Data reuse 
that keeps the arithmetic units busy; and (iii) The fully pipelined arithmetic, including 
complex operations such as inverse square root, that are much faster clock-for-clock on 
a GPU than on a CPU. 

The standard model that explains the large scale structure formation (e.g. galaxies, 
clusters of galaxies) assume that they were formed from the amplification of the small 
initial fluctuations via gravitational instability [7]. These models attempt to reduce 
cosmology to an initial value problem [1]. Then, given the initial conditions(universe 
at high redshift) and a cosmological model, the goal is to compute the evolution of 
structure until z = 0. The initial conditions are a density-velocity field represented by a 
set of particles which must be set-up in compatibility with of the observed primordial 
universe on cosmic microwave background radiation [15, 16, 13]. We generate the initial 
conditions by imposing perturbations on an initially uniform state represented by a 
"glass" distribution of particles generated by the method of White (1996). Using the 
algorithm described by [11], which is based on the Zeldovich approximation [15], a 
Gaussian random field is set up by perturbing the positions of the particles and assigning 
them velocities according to growing mode linear theory solutions [12]. 

The cosmological models specifies the universe composition(matter, radiation, space- 
time curvature) and attempt to describes its evolution and interaction, they usually in- 
clude some exotic ingredients as the dark matter and dark energy to explain anomalous 
galactic rotation curves and the accelerated expansion, respectively. Assuming that the 
gravitation is the dominant force (e.g. [8]) we compute the structure evolution by inte- 
grating the motion equations and computing the N-body interaction forces in every time- 
step. The motion equations are in comoving spatial coordinates to consider the universe 
expansion and boundary conditions are periodic to be consistent with a homogeneous 
elements distribution [1, 8, 12]. The solutions of the motion equations would be exact if 
we were able to simulate the motions of all individual bodies. Unfortunately this can not 
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be achieved since the systems of interest (cosmological large scale structures) contain of 
the order of 10^*^ stars. We approximate collection of a very large number of stars in the 
universe by one galaxy in an N-Body simulation. Therefore the particles in an N-Body 
simulation must interact in a purely coUisionless manner [9, 10]. The mass of a galaxy 
(typically of the order of 0.1 to 1 x lO^^M© ) is adjusted in its normalized form (0.1-1) 
in the initial condition hypercube. 

Based on the formalism described above we have implemented the algorithm COLA- 
TUS that is described in a simplified form in Appendix A. 

III. PRELIMINARY RESULTS 

The first main objective to be reached with the COLATUS is to show that simulations 
with the simplest type of particle-particle (PP) gravitational interaction, containing typ- 
ical densities of barionic content only, one can get collapsed patterns consistent with 
gravitational simulations that include dark matter. 




FIGURE 1. Three snapshots from a COLATUS simulation with N = 32^ Box size L = lOOMpc, IC= 
Eisenstein and Hu Spectrum (z — 50). 

Figure 1 depicts three snapshots of the large-scale structure formation for a cosmo- 
logical box containing 32^ galaxies in the critical density distributed in a volume of 
(lOOMpc)^ which evolves over 51 units of redshift. The spectrum for the initial distri- 
bution of particles follows the Eisenstein-Hu law [23]. For z = we have computed the 

Nf 1. 

relation between the number of filaments and the number of voids, = ^, and the aver- 
ages for the filament lengths Lf = SAMpc, filament thickness Lft = \33Mpc and void 
lenghts Ly = 39, 33Mpc, which are consistent with those obtained by simulations based 
on more sophisticated supercomputers (see for example [5] Virgo and Millenium). 

For the next simulations we intend to begin the study of the deformation potential 
influence on the large scale structures patterns (filament thickness, filament and void 
lenghts) obtained for z = 0. In order to implement this study we will modify the equation 



(1) by introducing a deformation potential 




as follow: 
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where C is a constant and Xy/ is the position where the potential has it max value. Note 
that potential force fall as l/r2, then it acts like a mass particle located in a fixed position. 



We show here a new way to apply simulations of gravitational N-body using CUDA 
/ GPU to test cosmological models that attempt to explain dark matter from effects 
due only to the curvature of spacetime. Preliminary results indicate that the algorithm 
is able to simulate the generation of large-scale structures whose statistical properties 
are consistent with simulations that use of more sophisticated numerical schemes for 
supercomputers based on CPUs. 

An important fact in this new algorithm is that it allows to directly follow the trajectory 
of individual galaxies during the evolution of the universe, thus allowing to test, with 
high precision, the effects of possible global and local deformations of the Lagrangian 
particle description in spacetime. In this context, the statistical methods for structural 
fine analysis should be those commonly used to characterize the process of chaotic 
advection which tracers moving in turbulent fluids are subject. In our cosmological 
approach, galaxy clusters are formed only by baryonic matter that also feel the effect 
of deformation that will intensify the effect of gravitational instabilities inherent in the 
system. 



The COLATUS algorithm implement the direct summation scheme by using a code 
written in CUDA. On the GPUs each thread compute the force, acceleration, position 
and velocity of each particle. The sequence of the algorithm is as follows: 

1. First, the CPU reads the initial conditions from a datafile and copy the initial 
positions and velocities to the global memory; 

2. On the CPU, the main program allocate the Memory on the GPU; 

3. On the CPU, the main program copy the initial conditions to the GPU; 

4. On the CPU, the main program initialize the threads on the GPU each of them will 
perform the calculations; 

5. On the GPU, the forces are calculated by the threads. 

6. On the GPU, we integrate the motion equation to actualize the particles velocities 
and positions. We are testing the viability of make the numerical integration on the 
CPU, then we will use the GPU only for calculation of the iteration forces (e.g. 



IV. CONCLUDING REMARKS 



APPENDIX A 



[21])). 



7. The process is repeated and concurrently other library shows a particle motion in 
the box. 

8. At every time-step the GPU copy to the CPU the particle position and velocity. 

Below is an excerpt from the main part of the code in CUDA which is responsible for 
calculating the particle -particle gravitational interaction. 



1 device float4 computeDistance (f loat4 posl, float4 pos2, 

2 float min_distance) { 

3 ... 

4 } 



5 global void f orce_calculation (f loat4* position, float4 * velocity, 

6 float4* accell, int numbodies, float g, 

7 bool useCollision, float min_dist, float dt, floats box_dim) { 

8 int bx = blockldx.x; int tx = threadldx.x; 

9 int dimX = blockDim.x; int idx = bx * dimX + tx; 



10 float4 acc = make_float4 (0, 0, 0, 0) ; 

11 float4 netForce = make_f loat4 (0, 0, 0, 0) ; 

12 float4 pos = position [ idx] ; float4 vel = velocity [ idx] ; 

13 for (int 1=0; i<numbodies; i++) { 



14 if (idx != i) { 

15 float4 directionDist = computeDistance (position [idx] , position [i] 

16 ,min_dist); float mass2 = position [ i ]. w; 

17 Force. X += g*mass2 * directionDist . x * directionDist . w; 

18 Force. y += g*mass2 * directionDist . y * directionDist . w; 

19 Force. z += g*mass2 * directionDist . z * directionDist . w; 

} } 



21 //Numerical Integration 

22 . . 

23 } 
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